{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 线性回归中使用梯度下降法(向量化)\n",
    "> 方法使用上一章我们封装好的梯度下降法，我们进一步进行了向量化提取和表示，如下图"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![J的导数dJ的向量化简化1](images/J的导数dJ的向量化简化1.png)\n",
    "\n",
    "> 下面是线性代数的向量化操作\n",
    "\n",
    "![J的导数dJ的向量化简化2](images/J的导数dJ的向量化简化2.png)\n",
    "\n",
    "> 最终的损失函数J的导数dJ的向量化表示\n",
    "\n",
    "![最终的损失函数J的导数dJ的向量化表示](images/最终的损失函数J的导数dJ的向量化表示.png)\n",
    "\n",
    "> 下面是具体的代码实现"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn import datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "boston = datasets.load_boston()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(490, 13)\n",
      "(490,)\n"
     ]
    }
   ],
   "source": [
    "X = boston.data\n",
    "y = boston.target\n",
    "\n",
    "X = X[y < 50.0]\n",
    "y = y[y < 50.0]\n",
    "print(X.shape)\n",
    "print(y.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from playML.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, X_test, y_train, y_test  = train_test_split(X, y, seed=666)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用常规方法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from playML.LinearRegression import LinearRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "lin_reg1 = LinearRegression()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 321 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "LinearRegression()"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%time lin_reg1.fit_normal(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8129794056212907"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lin_reg1.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用梯度下降法(向量化)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "lin_reg2 = LinearRegression()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "D:\\ProgramData\\Anaconda3\\lib\\site-packages\\numpy\\core\\fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce\n",
      "  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)\n",
      "E:\\code\\github\\algorithms\\Part5Improve\\06-Gradient-Descent\\05-Vectorize-Gradient-Descent\\playML\\LinearRegression.py:32: RuntimeWarning: overflow encountered in square\n",
      "  return np.sum((y - X_b.dot(theta)) ** 2) / len(y)\n",
      "E:\\code\\github\\algorithms\\Part5Improve\\06-Gradient-Descent\\05-Vectorize-Gradient-Descent\\playML\\LinearRegression.py:49: RuntimeWarning: invalid value encountered in double_scalars\n",
      "  if (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "LinearRegression()"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lin_reg2.fit_gd(X_train, y_train) # 因为数据集的偏差很大，所以会导致结果不收敛"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LinearRegression()"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lin_reg2.fit_gd(X_train, y_train, eta=0.000001) # eta太小，结果的准确率很低"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.27586818724477224"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lin_reg2.score(X_test, y_test) # 可能陷入了局部最优解，最好多循环一些"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 22.7 s\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "LinearRegression()"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%time lin_reg2.fit_gd(X_train, y_train, eta=0.000001, n_iters=1e6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7542932581943915"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lin_reg2.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用梯度下降法之前，最好要先进行数据归一化(强烈推荐)\n",
    "![使用梯度下降法之前最好要先进行数据归一化](images/使用梯度下降法之前最好要先进行数据归一化.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import StandardScaler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "standardScaler = StandardScaler()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "StandardScaler(copy=True, with_mean=True, with_std=True)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "standardScaler.fit(X_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_standard = standardScaler.transform(X_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "lin_reg3 = LinearRegression()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 104 ms\n"
     ]
    }
   ],
   "source": [
    "%time lin_reg3 = lin_reg3.fit_gd(X_train_standard, y_train) # 归一化后解决了算法的方差大的问题，不用再循环多次了(n_iters)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test_standard = standardScaler.transform(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8129873310487505"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lin_reg3.score(X_test_standard, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 梯度下降法的优势"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 数据量越大，梯度下降法相对于线性回归法的耗时优势越大"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np \n",
    "from playML.LinearRegression import LinearRegression\n",
    "m = 1000\n",
    "n = 5000\n",
    "\n",
    "big_X = np.random.normal(size=(m, n)) # m行代表样本数, n列代表特征数，每一行代表一个多元线性回归方程\n",
    "true_theta = np.random.uniform(0.0, 100.0, size=n+1) # 生成n+1个0~100的theta，即多元线性方程的系数\n",
    "big_y = big_X.dot(true_theta[1:]) + true_theta[0] + np.random.normal(0., 10.0, size=m) # 利用矩阵运算法计算出预测值y_hat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 2.3 s\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "LinearRegression()"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 先使用传统的方法进行训练\n",
    "big_reg1 = LinearRegression()\n",
    "%time big_reg1.fit_normal(big_X, big_y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 2.3 s\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "LinearRegression()"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big_reg2 = LinearRegression()\n",
    "%time big_reg2.fit_gd(big_X, big_y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 梯度下降法的缺点：每一个样本都参与了运算，这使得当样本数较大时，计算梯度也会很慢，所以有了下一节的`随机梯度下降法`"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
